Mining Association Rules from Unstructured Documents
ثبت نشده
چکیده
This paper presents a system for discovering association rules from collections of unstructured documents called EART (Extract Association Rules from Text). The EART system treats texts only not images or figures. EART discovers association rules amongst keywords labeling the collection of textual documents. The main characteristic of EART is that the system integrates XML technology (to transform unstructured documents into structured documents) with Information Retrieval scheme (TF-IDF) and Data Mining technique for association rules extraction. EART depends on word feature to extract association rules. It consists of four phases: structure phase, index phase, text mining phase and visualization phase. Our work depends on the analysis of the keywords in the extracted association rules through the co-occurrence of the keywords in one sentence in the original text and the existing of the keywords in one sentence without co-occurrence. Experiments applied on a collection of scientific documents selected from MEDLINE that are related to the outbreak of H5N1 avian influenza virus. Keywords—Association rules, information retrieval, knowledge discovery in text, text mining.
منابع مشابه
CSCR001: Literature Survey
My PhD research focuses on Text Mining (TM), one major school in Knowledge Discovery in Data (KDD), and in particular the task of classification/categorization of documents using novel algorithms for the identification of hidden patterns within these documents. Two significant techniques of Data Mining (DM), another well-known major school in KDD, will be utilized to support the research: Assoc...
متن کاملA Strategy to Compromise Handwritten Documents Processing and Retrieving Using Association Rules Mining
Massive amount of new information being created and the world’s data doubles every 18 months, 80-90% of all data is held in various unstructured formats. Useful information can be derived from this unstructured data. The aim of this research is to present a framework for handling handwritten documents in all its trends. Since the handwritten documents are unstructured data, so the objectives of...
متن کاملText Mining: Extraction of Interesting Association Rule with Frequent Itemsets Mining for Korean Language from Unstructured Data
Text mining is a specific method to extract knowledge from structured and unstructured data. This extracted knowledge from text mining process can be used for further usage and discovery. This paper presents the method for extraction information from unstructured text data and the importance of Association Rules Mining, specifically for of Korean language (text) and also, NLP (Natural Language ...
متن کاملRelevant Characteristics Extraction from Semantically Unstructured Data Phd Thesis Title: " Data Mining for Unstructured Data " Author: Relevant Characteristics Extraction from Semantically Unstructured Data Relevant Characteristics Extraction from Semantically Unstructured Data
1 Introduction Most data collections from real world are in text format. Those data are considered semi structured data because they have a small organized structure. Modeling and implementing on semi structured data from recent data bases grows continually in the last years. More over, information retrieval applications, as indexing methods of text documents, have been adapted in order to work...
متن کاملTextmining: Generating association rules from textual data
Textmining is an emerging research area, whose goal is to discover additional information from hidden patterns in unstructured large textual collection. Hence, given a collection of text documents, most approaches of text mining perform knowledge-discovery operations on labels associated with each document, which are usually keywords that represent the result of non-trivial keyword-labeling pro...
متن کامل